An investigation of documents from the World Wide Web
نویسندگان
چکیده
منابع مشابه
An Investigation of Documents from the World Wide Web
We report on our examination of pages from the World Wide Web. We have analyzed data collected by the Inktomi Web crawler (this data currently comprises over 2.6 million HTML documents). We have examined many characteristics of these documents, including: document size; number and types of tags, attributes, file extensions, protocols, and ports; the number of in-links; and the ratio of document...
متن کاملReplicated Documents for the World-wide Web
The number of users of the WorldWide Web (WWW) increases on a daily basis, thereby consuming more and more of the Internet's restricted resources. A large fraction of the workload consists in retrieving WWW documents on remote servers. Since WWW documents are generally infrequently updated but extremely often retrieved, providing multiple copies of these documents is the almost natural way to c...
متن کاملVisual Definition of Virtual Documents for the World-Wide Web
Trying to support the presentation of large amounts of heterogeneous data on the World-Wide Web normally results in relocating and restructuring the original data. Our approach avoids these disadvantages by generating metadata imposing an arbitrary logical structure on existing and new data. This paper proposes a new high-level visual language as a user-friendly means to control the process of ...
متن کاملPutting Paper Documents in the World-Wide Web
Adding hypertext links to a digital library that consists of scanned-in paper documents is sensible in order to enhance the library’s functionality with regard to a more intuitive kind of searching. Such a hypertext structure can be automatically generated making use of the information as delivered by an OCR software. Of course, also the hypertext presentation could be based on the pure charact...
متن کاملQuerying Semantically Tagged Documents on the World-Wide Web
QUEST is a system for Querying Semantically Tagged documents on the World-Wide Web. The advent of new markup languages, such as xml, facilitates authoring of Web documents that contain not just html tags for instructing a browser how to view a document, but also contain objects that represent the semantic structure of the document. When such documents become widely available, more powerful meth...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computer Networks and ISDN Systems
سال: 1996
ISSN: 0169-7552
DOI: 10.1016/0169-7552(96)00064-5